Dna Compressed and Sequence Searching on Multicore
نویسنده
چکیده
One of the used of string matching is to search DNA sequence in the DNA database. This simple operation can be done in hours or days, because the huge size of DNA sequence database. On the other hand, the potential of multicore for DNA sequence searching is not fully explored due to the difficulty of multicore programming. This paper evaluates several key string matching algorithms using a comprehensive simulation framework. Starting from decoding compressed DNA sequence and instructions processor profiling, the framework constructs task graphs for string matching algorithms. Then task graphs are mapped onto multicore. The mapping technique is based on a random algorithm result in a high mapping quality. The key feature of this paper is that entire processes are automated and it requires users little understanding of the complexity of algorithms and multicore hardware architecture. DNA compressed can save up to 75% space and our framework can be as a guidance to utilize multicore for searching DNA pattern.
منابع مشابه
Practical aspects of Compressed Suffix Arrays and FM-Index in Searching DNA Sequences
Searching patterns in the DNA sequence is an important step in biological research. To speed up the search process, one can index the DNA sequence. However, classical indexing data structures like suffix trees and suffix arrays are not feasible for indexing DNA sequences due to main memory requirement, as DNA sequences can be very long. In this paper, we evaluate the performance of two compress...
متن کاملCompressed Domain Scene Change Detection Based on Transform Units Distribution in High Efficiency Video Coding Standard
Scene change detection plays an important role in a number of video applications, including video indexing, searching, browsing, semantic features extraction, and, in general, pre-processing and post-processing operations. Several scene change detection methods have been proposed in different coding standards. Most of them use fixed thresholds for the similarity metrics to determine if there wa...
متن کاملProject 2: Pattern Matching in Compressed DNA Sequence
Space efficient storage of large genome sequences requires good compression techniques. However, if these sequences need to be decompressed, before any processing can be done over them, the advantage of compression is lost. New techniques are required to extend the traditional pattern matching algorithms to work directly on the compressed sequence. This saves space in memory, requires less disk...
متن کاملCompression of large DNA databases
The thesis explores algorithms to efficiently store and access repetitive DNA sequence collections produced by large-scale genome sequencing projects. First, existing general-purpose and DNA compression algorithms are evaluated for their suitability for compressing large collections of DNA sequences. Then two novel algorithms for compressing large collections of DNA sequences are introduced. Th...
متن کاملFast search in DNA sequence databases using punctuation and indexing
Exact pattern searching in DNA sequence databases has applications in identification of highly conserved regulatory sequences, the design of hybridization probes, and improving performance of approximate homology searching tools such as BLAST and BLAT. We propose a new pattern searching algorithm, CompressedPunctuated-Boyer-Moore (cp-BM), to enhance exact pattern match searches of DNA sequences...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012